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EFFICIENT HASHING METHOD 


10 


Background of the Invention 


Field of the Invention 


The present invention relates to data manipulation; more specifically, the 
invention relates to an efficient technique for representing long strings of data as shorter 
strings of data. 


Description of the Related Art 

Hashing is a technique for representing longer lengths of data as shorter lengths of 
data. The techniques are such that there is a relatively small probability that two different 
longer lengths of data will be represented as identical short lengths of data. The feature is 
1 5 called a probability of collision. 

Pr(h(m { )=h(m 2 ))<e (1) 

20 The probability of collision is represented by Equation (1) which indicates that the 

probability of a hashing function "h" performed on a string x, being equal to the result of 

a hashing function "h" performed on a string x 2 being less than or equal to -V or s. 

2 

The : fitttnber^of bits contained in the longer unhashed string is "n" and is called a 
domain. The number ^fbi&fe^ig^horter or hashed string is "1" and is often referred to 
25 as the range of the hashing function. A^ha§te*gjT^ction that satisfies Equation (1) is 
often referred to as s universal. 
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Pr(h(m l )-h(m 2 )=A)<e (2) 

2' 

Another property typically associated with hashing functions is represented by 
Equation (2) where it indicates that the probability of the difference between the output of 
a hashing function "h" on string Xj and the output of a hashing function on string x 2 being 

equal to some preselected number A is less than or equal to or e. Hashing functions 

T 

that satisfy Equation (2) are typically referred to as s A universal hash functions. 


?r{h(m x ) = c i9 hM = c 2 )<j r (3) 


e > 


2' 


Some hash functions also have a third property illustrated by Equation (3). 
Equation (3) shows that the joint probability of the output of hashing function "h" for 
input string X! being equal to a predetermined number Cj and the output of hashing 

function "h" for input string x 2 being equal to predetermined number c 2 is less than or 

T 

8 . A hashing function that satisfies Equation (3) is referred to as s strongly universal. 
Hashing functions that satisfy Equation (3) automatically satisfy Equations (1) and (2). 

Hashing functions are used in many applications, one of which is to simplify 
searching for text strings. When used for searching for text strings, the hashing function 
is used to reduce the size of the stored information and then the same hashing function is 
used to reduce the size of the search criteria. The shortened search criteria is then used to 
search for the shortened stored information to more efficiently locate a desired piece of 
information. Once the desired piece of information has been located, the unhashed or full 
length text associated with the shorted text can be provided. 
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Hashing functions are also used in wireless communications for message 
authentication. A message is authenticated by sending a message string along with a tag, 
calculated by performing a cryptographic function on the message. Forming a tag of a 
message string is computationally intensive. Hash functions are used to shorten the 
message to a tag so that the cryptographic processing required is less intense. 


Techniques such as linear hashing illustrated by Equation (4) and MMH hashing 
illustrated by Equation (5) are now used to represent longer strings of data or text as 
shorter strings where the probability of two different long strings producing the same 
short string is relatively small. These hashing functions require a multiplication of a key 
that is "w" words long by a "w" words long message or text that is to be hashed. As a 
result, w 2 operations are required to perform a hashing of a particular string of data or 
text. For large strings of data or text having many words, this results in a 
computationally intensive operation. 

Summary of the Invention 

. 2 

W + W 

The present invention provides an efficient hashing technique that uses — 

operations to hash a string "w" words long rather than the w^operations of the prior art. 
The present invention achieves this efficiency by squaring the sum of the key and the 
string to be hashed rather than forming a product of the key and the string to be hashed. 


h(m) = (ma)mod p 


(4) 



h(m) = ((/w + a) 2 mod p )mod , (6) 


15 
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# • 


In one embodiment of the invention, as illustrated by Equation (6), a hashing of a 
message 4 to*^s performed by summing the message string with a key string "a" and then 
forming the squarebCAat summation. A modular "p" operation performed on the result 
5 of the squaring operation afrekai modular 2 1 operation is performed on the result of the 
modular "p" operation. In this casfc>t*oth "m" and "a" are of the same length, that is, 
"n" bits or "w" words long. It should be note4tiiat "a" may be longer than V bits, but 
"n" bits is preferable. The value "1" refers to the length in bits of the shortened string that 
results from the hashing and is referred to as the range. Th^c^ue "p" is selected as the 
10 first prime number greater than 2 n where "n" is the number of bit^Sn the message string 
"m". It should be noted that Equation (6) provides a hashing method 1 
Equations (1) and (2), that is, the hashing method of Equation (6) is A univers^Kx 



h{m) = i[(m + a) 2 + b)mod p )mod 2/ (7) 


In the second embodiment of the present invention, a strongly universal hashing 
method is provided. In this case, message string "m" is summed with key "a" and then 
the resulting sum is squared. Both message string "m" and key "a" are "w" words long 
containing a total of "n" bits. It should be noted that key "a" may contain more than "n" 

20 bits, but "n" is preferable. The result of the squaring operation is then summed with a 

second key "b" which is at least "n" bits long. A modular "p" operation is performed on 
the sum of the squared term and key "b" as discussed above with regard to Equation (6). 
A modular 2 1 operation is performed on the result of the modular "p" operations as was 
described with regard to Equation (6). Using this hashing method provides a strongly 

25 universal hashing method that satisfies Equation (1), (2) and (3). 

In yet another embodiment of the present invention, "k" messages or strings are 
hashed so that a single shorter string is produced. 
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(( K ^ ^ 

■k)= s ( w / + °i y mod P mod 2 ' 


(8) 


Equation (8) illustrates the hashing function where "k" messages, each of which 
is "w" words long are hashed to form a single shorter string. Each message n^ is summed 
5 with a key a^ and the resulting sum is squared. The result of the squaring operation for 
each message m { is then summed over the "k" messages. A modular "p" operation is 
performed on the overall sum, and a modular 2 1 operation is performed on the result of 
the modular "p" operation. The values "p" and "1" are once again defined as described 
above. The hashing method illustrated by Equation (8) produces a A universal hashing 
rt 10 function that satisfies Equations (1 ) and (2). 

SJ Brief Description of the Drawing 

; 531 

£V FIG. 1 is a flowchart of a square hashing method; 

*J FIG. 2 is a flowchart of a strongly universal square hashing method; and 

f 15 FIG. 3 is a flowchart of a second A universal square hashing method. 

(5 Detailed Description 

J? FIG. 1 illustrates a method for carrying out the square hashing method of 

Equation (6). In step 100 an input string or message "m" is inputted. In step 102 an 

20 input key "a" is inputted. The message or string "m" and the key "a" are each "n" bits 
long consisting of "w" words. Key "a" is a random or pseudo-random number and may 
be longer than "n" bits, but "n" bits is preferable. In step 104 the sum "s" of string "m" 
and key "a" is formed. In step 106 sum "s" is squared. In step 108 a modular "p" 
operation is formed on the result of step 106. "p" is the next prime number larger than 

25 2 n ; however, "p" may be a larger prime which may degrade performance. In step 110 a 
modular 2 1 operation is performed on the result of step 108. "1" is the number of bits in 
the short output message or string. In step 112 the result of the modular 2 1 operation is 
outputted. The process of FIG. 1 results in a message or string of "n" bits being reduced 
to a message or string of "1" bits. It should be noted that the process associated with FIG. 
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1 executes an s A universal hash function that satisfies the properties of Equations (1) and 
(2). 

FIG. 2 illustrates a method for carrying out the strongly universal hashing method 
described by Equation (7). In step 140 a message or string "m" is inputted. In step 142 
5 keys "a" and "b" are inputted. Message "m", key "a" and key "b" are each "n" bits long 
having "w" words. In step 144 the sum of message "m" and key "a" is formed and stored 
as sum "s". In step 146 the square of sum "s" is stored as term "SQ". In step 148 the 
sum of the term "SQ" and key "b" is formed. In step 150 a modular "p" operation is 
performed on the result produced by step 148. Once again, "p" is equal to the next prime 
10 number greater than 2 n ; however, "p" may be a larger prime which may degrade 

performance. In step 152 a modular 2 1 operation is performed on the result from step 150. 
"1" is equal to the number of bits in the string or message to be outputted by this method. 
In step 1 54 the short message or string of length "1" is outputted. It should be noted that 
the method of FIG. 2 reduced a string or message of "n" bits to a string or message of "1" 
15 bits. It should also be noted that the process of FIG. 2 is an 8 strongly universal hash 
function that satisfies the properties of Equations (1), (2) and (3). 
)2^> Ns EsKj. 3 illustrates a method for performing the s A universal hashing method 

described by^Equation (8). In step 170 index "i" is set equal to 1 and the variable SUM is 
set equal to 0. In'step 172 the value of "k" is inputted, "k" is equal to the number of 
20 strings or messages th^will be inputted to produce a single shortened message. In step 
1 74 message or string is&sparated, and in step 1 76 input key is inputted. It should 
be noted that message or string nva^d input key ^ are of equal length and have "n" bits 
composing "w" words. Key is a ranfcUxn or pseudo-random number and may be 
longer than "n" bits, but "n" bits is preferableXPreferably, a i is a random number. 
25 Random numbers can be generated from many souhs^s such as pseudo-random 

generators. In step 178 sum s { is formed by forming th^smn of message n^ and key ai. 
In step 180 the square of s { is set equal to variable SQ X . In srept. 182 the variable SUM is 
set equal to the variable SUM plus SQ,. In step 184 the value of 'Tsis checked to 
determine if it is equal to the value "k". If it is not equal to the value "RH-step 1 86 is 
30 executed where the value of index "i" is incremented by "1" and then step 1 74as 
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^cuted. If in step 184 the value of "i" is determined to be equal to "k", step 188 is 
execiit$d where a modular "p" operation is performed on the current value of the variable 
SUM. As ak^ussed previously, the value "p" is the next prime number greater than the 
value 2 n ; howevev^p" may be a larger prime which may degrade performance. In step 
5 190 a modular 2 1 operation is^gerformed on the results produced in step 188. Once again, 
"1" is the number of bits composing^he output string or message. In step 192 the 
shortened message or string of "1" bits is s ojitputted. It should be noted that the process of 
FIG. 3 reduced "k" messages of "n" bits each tthmie message of "1" bits. It should also 
be noted that the hashing method of FIG. 3 is a e A uhi^ersal hashing method that satisfies 
10 the properties of Equations (1) and (2). 

In reference to FIGS. 1, 2 and 3, it should be noted that the value "1" is typically 
chosen based on a trade-off between desiring a short output message of length "1" and the 
desire to minimize the probabilities of Equations (1) and (2) and in the case of an s 
strongly universal hash function, Equation (3). 
15 The following section provides an abbreviated proof showing that the disclosed 

squaring hash functions satisfies the properties for Equations (1), (2) and (3). 


Theorem 1 : The hashing function described by Equation (6) is A - universal. 


Proof: For all "m" * "n" e Z p , and A e Z p : 


Px r [hx( m ) * hx( n ) ~ A ] (1) 


25 


= P x r[(m + x) 2 - (n + x) 2 = A ] 
= P x r[(m 2 - "n" 2 + 2(m-n)x = A ] 
= l/p 


(2) 
(3) 
(4) 
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Where the last inequality follows since for any given "m" * "n" e Z p and S e Z p there 
is a unique x which satisfies the equation m 2 - "n" 2 + 2(m-n)x = 8 . 

Theorem 2: The hashing function described by Equation (7) is a strongly universal 
family of hash functions. 

Proof: Follows as an immediate corollary of the following lemma which shows how to 
convert any A - universal family of hash functions into a strongly - universal family of 
hash functions. 

Lemma 1 : Let "h" = {h* : D — > R|x e K}, where R is an abelian group and "k" is the 
set of keys, be a A - universal family of hash functions. Then 

H' = {h' xJb :D-> R\xeK 9 b£R] defined by W xb (m) =(h x (m) + b) (where the addition 
is the operation under the group R) is a strongly universal family of hash functions. 

Proof: For all "m" * "n" e D and all a , p e R: 


Pr [^(m)=a,^ (n) = fi] 

x,b 


(5) 


= Pr [h^m) + b= a,h x (n) + b= y tf 


(6) 


= Pr [h^m) - h^n) = a b = a - h^m)] 


(7) 



= 1/|R 2 (9) 
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The last equation follows since h* is a A - universal hash function and h^m) - h^n) can 
take on any value in R with equal probability. 


